Doris: A Tool for Interactive Exploration of Historic Corpora
نویسنده
چکیده
Insights into social phenomenon can be gleaned from trends and patterns in corpora of documents associated with that phenomenon. Recent years have witnessed the use of computational techniques, mostly based on keywords, to analyze large corpora for these purposes. In this paper, we extend these techniques to incorporate semantic features. We introduce Doris, an interactive exploration tool that combines semantic features with information retrieval techniques to enable exploration of document corpora corresponding to the social phenomenon. We discuss the semantic techniques and describe an implementation on a corpus of United States (US) presidential speeches. We illustrate, with examples, how the ability to combine syntactic and semantic features in a visualization helps researchers gain insights into the underlying phenomenon.
منابع مشابه
Geomatics and Architectural Heritage: a Multi-layer Interactive Map of Tuscia-Italy
The main aims of this research are the design and implementation of a multilayered and interactive geomatic map of the cultural heritage of Tuscia, one of the richest and most complex cultural areas of Italy, thanks to the presence of different civilizations, from Etruscans and Romans to the Middle Age. Its cultural heritage is very rich, valuable and above all diversified because including tan...
متن کاملSCHNAPPER: A Web Toolkit for Exploratory Relation Extraction
We present SCHNÄPPER, a web toolkit for Exploratory Relation Extraction (ERE). The tool allows users to identify relations of interest in a very large text corpus in an exploratory and highly interactive fashion. With this tool, we demonstrate the easeof-use and intuitive nature of ERE, as well as its applicability to large corpora. We show how users can formulate exploratory, natural language-...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملA Comparative Analysis of Metadiscourse Markers in the Result and Discussion Sections of Literature and Engineering Research Papers
This study compares metadiscourse markers in result and discussion sections of literature and engineering research papers. To this end, 40 research articles (20 literature and 20 engineering) are selected from two major international journals. Based on Hyland’s (2005) model of metadiscourse, the articles are codified in terms of frequency, percentage, and density of interactive and interactiona...
متن کاملSemantic Pathways: A novel visualisation of varieties of English
Semantic Pathways is a corpus exploration tool with a unique visual interface in which keyword extraction and keyword-based document clustering have been implemented in order to facilitate insight forming. Semantic Pathways combines corpus comparison techniques from Corpus Linguistics with aestheticallydriven design and interaction, to produce fluidly interactive information exploration. In add...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017